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" SOFTWARE SYSTEM F=OR DEPLOYING IMAGE PROCESSING FUNCTIONS ON A PROGRAMMABLE 
PLATFORM OF DISTRIBUTED PROCESSOR ENVIRONMENTS " 



Description 

5 

Field of the Invention 

The invention relates to a Software System, referred to as Image Transport Engine, for 
processing a sequence of images by deploying Image Processing Functions onto a multiprocessor 
system called Platform, said Platform generating input image data in order to provide processed 

10 output image data. Deployment is known as an operation of mapping components of software on 
components of hardware. 

The invention finds a particular application in the field of Software Systems designed for 
processing sequences of medical X-ray images. 

Background of the Invention 

15 A software package that offers the ability to make several processes collaborate to the same 

image processing tasks, using multiprocessor-based signal processing systems forming a distributed 
programmable platform, is already known from the publication entitled "A new development 
Framework Based on Efficient Middleware for Real -Time Embedded Heterogeneous Multi-computers" 
by Randal JANKA in: "Proceedings of IEEE Conference and Workshop on Engineering of Computer- 

20 Based Systems, Nashville, TN, USA, 7-12 March 1999", p.261-268. According to the above-cited 
publication, signal processing applications are growing in complexity and now require "scalable 
heterogeneous multi-computers" to achieve satisfactory performance. Specific multiprocessor 
hardware targets are therefore increasing in computational throughput and inter-processor 
communication bandwidth to satisfy these requirements. A software that eases the difficulty of that 

25 development is known as a "Middleware", which sits in a layer above a standard operating system and 
below the application software, i. e. the Middleware is a general-purpose software that sits between a 
platform and the application. The platform is a set of processing elements defined by processor 
architectures and operating system application programming interfaces. The Middleware is defined by 
the application programming interfaces and protocols that it supports. It may have multiple 

30 implementations that conform to its interface and protocol specifications, depending on the number 
and type of the hardware targets it serves. A "Framework" is a software environment, which is 
designed to simplify application development and system management for a specialized application 
domain such as real-time signal processing, and which is defined by an application programming 
interface, a user interface, and Middlewares. A Middleware called TALARIS Middleware from 

35 "MERCURY Computer Systems" is considered in the cited publication. A Framework called "PeakWare 
for RACE" referred to as PW4R", is layered on top of TALARIS Middleware, which is designed to 
support the integration of graphical development tools to scalable heterogeneous systems. PW4R 
allows the developer to graphically specify the software and hardware of the application, then to map 
the software to the hardware. Thus, PW4R is a high-level means permitting the MERCURY Computer 








System of showing improved flexibility, i, e. of being easily up-graded, which was not the case of 
previous systems using low-level means. Many different algorithmic configurations might be needed in 
relation to such a Computer System. Also, algorithms that are complex might be combined or 
connected to other algorithms newly introduced over the platform. Moreover, the platform might 
5 receive new computer means or some computer means might be exchanged or eliminated. PW4R is a 
solution to this flexibility problem. 



In digital image processing, and in particular, in medical digital X-ray image sequence 
processing, processors are used in order to achieve algorithmic image treatments required during real- 

10 time functioning. Due to the computing power this entails, several processors distributed on a 
platform must collaborate to the same task. The known Framework PW4R is not a solution to the 
problems of processing sequences of medical digital X-ray images for reasons exposed below. 

A first major problem regarding such an image processing system is related to latency, 
which is defined as the time necessary to produce a processed pixel with respect to the instant when 

15 the system starts to process this pixel. For instance, a medical system may be used to acquire a 

sequence of images and to produce processed images of said sequence during a cardiology operation 
comprising the introduction by a doctor of a catheter in a patient artery and the follow of the catheter 
progression in said artery on a screen. In this case, it is particularly important that the medical system 
deliver a close image-to-doctor feedback. When the doctor acts on the catheter, the result of his 

20 action must be perceived on the screen after a very small delay. An admissible delay is for example of 
80 ms, which corresponds to 2 image frames in a sequence acquired at a rate of 25 images per 
second. To this end, the total computation latency of the image processing system must be reduced. 

A second major problem regarding such an image processing is related to the amount of 
pixels to transfer in the admissible delay. This amount is of the order of mega-pixels to be processed 

25 per second, due to the real time functioning. 

The PW4R associated to the Computer System disclosed in the above-cited publication does 
not propose solutions to these two major problems. Designing a system that solves these problems 
requires a precise specification of a computation model that fits the image processing application field. 
This computation model comprises the design of efficient mechanisms to achieve the above-described 

30 necessary performances. 

An other problem lies in the use of memory means between the different processors 
distributed on the platform. If only one memory is shared by all the processors, this would result in 
contentions. Arbitrage means would be needed for controlling these contentions, resulting in latency. 
An other problem, specific of image processing, lies in the fact that either the image is processed pixel 

35 by pixel, which is not efficient, or image by image, which is efficient regarding transfer but is not 
efficient regarding latency. An other problem lies in the fact that many different algorithmic 
configurations might be needed. So, a method to automate the functioning of those various possible 
configurations is necessary. 



Summary of the Invention 




The invention has for an object to provide a Software System, referred to as Image 
Transport Engine, for processing a sequence of images by deploying Image Processing Functions onto 
a multiprocessor system called Platform, said Platform generating input image data in order to provide 
processed output image data. 
5 This problem is solved by an Image Transport Engine as claimed in Claim 1 and in the 

dependent Claims. This Image Transport Engine comprises a software data partitioning model, 
referred to as Communication Pattern, which partitions the images of the sequence using time- 
stamped data packets, the transfer of which may overlap the execution of said image processing 
functions. 

10 The Image Transport Engine according to the invention presents several advantages: It is 

able to perform the deployment of said Image Processing Functions on the platform in a way that is 
automatic and flexible. It permits of minimizing latency. In particular, it is both efficient regarding 
transfer and latency. It permits of deploying several algorithms carrying out different Image 
Processing Functions. It is particularly appropriate to use for medical X-ray image sequence 

15 processing applications. In an embodiment, the invention proposes means that solves the problem of 
contentions. 

Brief Description of the Drawings 

The invention is described hereafter in details in reference to diagrammatic figures, wherein: 
- FIG.1A is a block diagram of a pipeline structure of the Communication Pattern; 
20 - FIG. IB is a block diagram of a scatter/gather structure of the Communication Pattern; 

-FIG.1C is a block diagram of a branch structure of the Communication Pattern; 

-FIG1D is a block diagram of a wide-band structure of the Communication Pattern; 

-FIG.2 illustrates pipelining transmission without overlapping; 

-FTC.3A illustrates pipelining transmission with overlapping; 
25 -FIG.3B illustrates scattering transmission with overlapping; 

-FIG.3C illustrates gathering transmission with overlapping; 

-FIG.3D illustrates branch-connection transmission with overlapping; 

-FIG.4 is a block diagram of a medical apparatus having processing means for using the Image 
Transport Engine. 

30 Description of Embodiments 

The invention relates to a Software System referred to as Image Transport Engine 
based on a Software Component referred to as Communication Pattern, laying on top a hardware 
programmable architecture called platform. The Image Transport Engine performs the deployment of 
one or several algorithmic chain(s) of Processing Functions on said platform. The platform includes a 

35 set of distributed processor environments, labeled CE, linked by paths. In an example of embodiment, 
said Image Transport Engine is described hereafter for the automatic implementation of digital X-ray 
Image Processing Functions, labeled IP, on the platform. Efficiency, up-gradability, flexibility, low 
latency and user friendliness are some of the characteristics of this Image Transport Engine. 





This Image Transport Engine has constraints due to the application field which is the image 
post-processing for X-ray systems. For instance, as input data: the image size may be of the order of 
IK x IK x 16 bits pixels; the image rate of the order of 25 or 30 images per second. The scanning of 
the image is progressive, the data arriving as horizontal lines. The Image Transport Engine admits one 
5 live input only. The image size and image rate might change, but not in the course of an image 
sequence. For output data, the image size may be IK x IK x 16 bit pixels, to be displayed or stored. 
The display may be a medical monitor. Image storing on disk might be required at various level of the 
processing chain. In the processing chain, several algorithms can be combined. It is an object of the 
proposed Image Transport Engine that the latency is less than one frame time-interval. It may be 

10 admitted more latency when algorithms require more. The images are formed in series and all images 
of a given sequence must be processed so as there is no image loss. This Image Transport Engine 
also has the ability to compare different algorithms. The parameters may change between images. 
The algorithms that are to be deployed on the platform may comprise: Noise reduction processes 
including temporal, spatial or spatio-temporal processing functions, Contrast Enhancement or Edge 

15 Enhancement at multi-resolution, Image subtraction, Zoom with Interpolation, Image rotation, Motion 
Estimation, Structure detection, Structure extraction, as known by those skilled in the art. This list, 
given as a matter of example, is not exhaustive. The algorithms may be non-iinear, have different 
spatial breadths, i.e. neighborhoods, may be causal or anti-causal. The algorithm complexity may 
amount to about 100 operations per pixel. 

20 This list demonstrates that implementing a multiprocessor application for X-ray image 

processing is very much constrained. Data-rate, latency, filter spatial breadth and algorithm diversity 
make the design of an efficient implementation very delicate. Flexibility and up-gradability are 
constraining factors because, on top of the performance issue, they add a strong requirement for 
simplicity: algorithm insertion, removal or swapping must be achieved through simple interventions. 

25 Also the Image Transport Engine should be manageable to non^specialists; It should not endanger the 
application integrity. The image size or image rate may be easily changed. A new, possibly more 
performing, processor may be added to the platform. Any modification in the application infrastructure 
may be manageable with easy intervention. 

I) Specification of the Communication Pattern 

30 The Image Transport Engine is designed for answering all the above-cited requirements. 

Building an efficient communication infrastructure for a distributed processing application leads to the 
specification of a Communication Pattern. The overall functioning of said Image Transport Engine is 
automated via the mere specification of the Communication Pattern, so that all the data 
communications and processing controls are automatically realized. The Communication Pattern that 

35 is defined hereafter is able to handle distributed X-ray applications, and to provide a user with the 
means of devising a specific Communication Infrastructure adapted to a given application. 

The definition of this Communication Pattern comprises phases of defining the target 
hardware including a number of processors their computing power and connectivity. The targeted 
hardware platform may be associated to a commercial operating system and a commercial host 
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processor equipped with extra distributed programmable environments, referred to as Compute 
Environments, labeled CE, which can exchange data between each other. The Compute Environments 
are connected to each other by physical programmable data paths called BUS. Each CE contains a 
processor, its local memory and interfacing means capable of monitoring transfers over the BUS. 

5 Each CE is associated to a commercial real-time operating system, which contains basic facilities to 
manage data transfer and synchronization between CE. This commercial real-time operating system 
does not comply with the needs for flexibility and up-gradability previously defined. 

It is the reason why, according to the invention, the Software Component called Image 
Transport Engine is yielded, in order to facilitate the design and coding of parallel applications to the 

10 point where only Image Processing Functions have to be coded and where the whole application 

infrastructure can be specified and generated from a textual or diagrammatic description. This Image 
Transport Engine has for ambition to produce efficient applications with respect both to latency and 
fluency. To this end, the Image Transport Engine is based on a Communication Pattern designed for 
the transfer of Image Data Packets. Latency refers to the average traversing time of a data element 

15 throughout the system from input to output. Fluency refers to the efficiency of data transmission, 
synchronization and scheduling. Synchronization of an image data packet is efficient when very short 
as compared to the image data packet transmission time. The transmission of an image data packet is 
efficient when fully taking profit from the bandwidth of the BUS over which this transmission occurs. 
And the scheduling of an image data packet transmission is efficient when unobtrusive with respect to 

20 local processing, i. e. temporal overlapping of data processing and communication. 

Referring to FIG.1A to FIG.1D, the Communication Pattern is a model enabling the 
definition of the image processing application. Said Communication Pattern is an oriented software 
model able to achieve the automation and optimization of data communication and computation over 
a set of Image Processing Functions IP participating to said image processing application. Said 

25 Communication Pattern handles elementary image data packets, which are processed by Software 
Modules and passed from Module to Module in a regular way. The whole process can be seen as a 
flow of successive image data packets pushing each other in the model where they cross the 
computing units called Modules, which repeatedly process them one after the other. However, before 
reaching fluid data flowing the system must start from a frozen state, where every Module is blocked 

30 on an initializing condition and manage to get going pacefully, that is avoiding dead lock and incorrect 
synchronization. The Communication Pattern is formed of several software components arranged 
according to a Data Communication Design comprising Nodes, Interfaces and Edges. 

The Nodes are the Software Modules, referred to as Modules for simplicity, featured by 
boxes. The Modules are execution units. The role of a Module is to activate the image processing 

35 function labeled IP attached to it and to manage the corresponding data transfers and 

synchronization. Only one Image Processing Function IP is attached to a given Module. This function 
IP may in turn call several other functions. The Image Processing functions IP are defined within the 
boundary of the process and usually at an image data packet level, in order to insure a low latency. 
So, an image processing Application is defined as a set of Modules exchanging data, parameter or 
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synchronization information. The Communication Pattern does not exclude multi-processing, i. e. 
several Modules belonging to the same image processing Application running on the same Processor, 
nor concurrent image processing Applications, i. e. several distinct image processing Applications 
sharing the same processors. 
5 The Edges are links that are featured by arrows, and that are called Connections. The 

Communication Pattern defines the logical connectivity associated to the Modules of a given 
application. Thus, the Communication Pattern is oriented so that over a link data flows in one direction 
only and in a-cydic manner: loops are not permitted and there is no data link oriented upstream. The 
arrows representing the Connections are oriented accordingly. 

10 The Interfaces between Modules and Connections are the Ports. The Modules have a 

specific internal structure, which allows them to exchange information. The only way of exchanging 
information with a Module is through Ports. The Connection orientations indicated by the arrows 
define Input Ports and Output Ports. When these Modules have to communicate with entities external 
to the Communication Pattern, these entities are referred to as Terminal Ports. The Connections 

15 define the mechanism responsible for the information exchange between Module Ports. Logical 

connections can only be successful if they are associated to physical paths, i. e. physical connections. 

Three types of Connection classes are defined: A first type is the Data Connection class, 
wherein Connections deal with image data. The Data Connections are specialized in the transfer of 
image data packets and are one-way. A second type is the class of Terminal Connections, which 

20 comprises Half-Connections linking Module Ports to Terminal Ports and are one-way. A third type is 
the class of Parameter Connections, which handle algorithm parameters and are bi-directional. 

Each Data Connection bears a parameter indicating the type of the data to be transferred 
that is statically defined. In fact, the number of bytes per pixel suffices as far as data transferring is 
concerned. The Communication Pattern does not allow type conversion at data transfer time. If such 

25 conversions are necessary, they have to be implemented within the Image Processing IP functions. 
So, all the Input and Output Ports corresponding to a given Connection deals with the same data 
type, thus justifying a unique parameter per Connection. As far as data transfer is concerned, any 
number of bytes per pixel is acceptable. However, since the data type is also meaningful at the Image 
Processing IP function level, the possible values of 1, 2, 4 or 8 bytes per pixel are specified. 

30 The Modules comprise several types among which: a Source Module, labeled SOURCE, 

which does not feature any input data Connection and which is responsible to generate the data to be 
processed, together with synchronization information; a Sink Module, labeled SINK, which does not 
feature any output data Connection and is used as processed data receptor; and ordinary Modules, 
labeled MOD, which are neither Source nor Sink Modules. The Source and Sink Modules are by nature 

35 often equipped with input and output Terminal Connections. There is only one Source Module per 
Communication Pattern. Instead, there may be several Sink Modules. It is to be noted that terminal 
output connections do not necessarily emerge from a Sink Module. Terminal Connections may be used 
in several places of the Communication Pattern so as for example to save data on a given medium. 
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The SOURCE Module may feature several Output Data Connections, providing they deliver 
synchronous data streams. The Source Module is also responsible for producing a time reference data 
structure labeled Time-Ref which locates every image data packet of a given image sequence. In 
fact, this reference contains a field that identifies an image index within the sequence of images, and 
5 a field bearing the packet index within the current image. Time-Ref will be passed, along with the 
corresponding packet, from Module to Module. It permits of performing several important tasks, which 
are data locating, management of the Image Processing Function IP and delay management. Time- 
Ref is by definition a time reference structure that locates data packets with respect to the image 
index in the sequence and with respect to the data packet position within the current image. Thus, it 

10 is not a data packet reference in that several distinct data packets may be transferred along with 

Time-Ref structures containing the same information. In particular that is the case when a given input 
data packet produces several output data packets. 

With the elements introduced so far, it is already possible to create very sophisticated valid 
Communication Models, Connectivity rules are presented hereafter: each module is equipped with a 

15 unique parameter connection; each communication pattern must contain a least a Source and a Sink 
Modules; the Source Module is unique; several Sink Modules may coexist; the Source may feature 
several data output ports corresponding to synchronous streams; the Sink may feature several data 
input ports; an Ordinary Module may feature more than one input and/or output ports; a data output 
port may be read by several simple Data Connections; a data input port cannot be written into by 

20 several simple Data Connections; Data Connections may bypass pipeline levels; Data connection loops 
("no-loop" rule) are not permitted; several connections can only be gathered if they are issued from 
common scattering and if they feature the same strip transfer frequency; partial gathering is possible 
if it respects the last two rules. Most of the rules listed above are natural or permissive. However, the 
"no-loop" rule seems to unbearably discard the possibility of implementing temporal recursive 

25 algorithms. It is indeed an incorrect assertion. Temporal recursive filters need previous output data as 
input. But there is no need to recover previous output data from an external path, when those data 
were produced internally. It suffices to provide the developer with a way of memorizing data between 
consecutive images to overcome the difficulty. Thus, removing data connection loops from the 
Communication Pattern specifications avoids terrible implementation and theoretical problems. 

30 Likewise, it is not to be concluded that any feedback is impossible at the inter-module level. The "no- 
loop" condition only applies to Data Connections, but Parameter Connections can perfectly be used to 
introduce limited feedbacks, for instance, involving control information. 

II.) Mapping the Communication Pattern over the Platform. 
The Communication Pattern is mapped over the Platform. The mapping operation defines 

35 the association between Modules and Compute Environments and yield an implementation of the way 
the image processing functions IP interacts with the data transiting over the processor network. This 
operation comprises phases of partitioning the Input Data into Data Packets and transmitting said 
image Data Packets. 
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According to the invention, the Input Data are Partitioned into data packets, which are data 
slices called Image Strips, by the Source Module. An Image Strip is a packet of consecutive image 
lines, possibly overlapping with other Image Strips. In the following example of X-ray Medical Image 
Processing, an image of a sequence is scanned horizontally and the data arrive along horizontal lines 

5 formed of pixels that have to be processed. The image is not processed pixel by pixel, which is not 
efficient, nor the sequence is processed image by image, which is efficient regarding transfer but is 
not efficient regarding latency. Instead, the image is divided into Image Strips of several lines 
corresponding to the image data packets above-described. The Image Strips are parallel to the image 
lines. The Image Strips are transferred for image processing. In fact, each Module introduces a 

10 latency that is proportional to the amount of data to process. It has been found, according to the 
invention, that the use of Image Strips is efficient both in high transfer and in weak latency. The 
choice of the number of lines for forming one Image Strip determines a compromise between the 
transfer efficiency and the latency. Image Strips keep latency to a minimum. Thus, all that have 
been previously described related to Data Packets is to be considered to be applied to Image Strips. 

15 In an example of embodiment, an image formed of 1024 lines comprises 32 Image Strips of 32 lines. 

Since most X-ray imaging algorithms introduce 2-D neighborhoods, spatial overlapping must 
be taken into account. Overlapping Areas are needed at the Input Port level. The IP function is 
provided with regions containing both the active area formed of the pixels to be processed and the 
Overlapping Area formed of extra pixels needed to achieve processing within the active area. Since 

20 the input regions seen by the IP function should naturally coincide, overlapping parameter is declared 
at the Module level. This means that algorithms combining several entries that require different spatial 
breadths are in feet provided with regions featuring unique overlapping. Also, symmetrical overlapping 
geometry is considered. Since only horizontal Image Strips are regarded, the overlapping effect can 
be taken into account by a sole Module level Parameter relating to the number of overlapping lines on 

25 either sides of the Image TStrip. It is to be Noticed that "spatiaroverlapping introduces extra strip delays 
at each Module crossing. It is to be noted that spatial overlapping should not be confused with 
Temporal overlapping, which refers to the ability of making data transfer overlap data processing. 

For launching a user-defined Image Processing Function IP on the current Image Strip, 
optimization is realized by achieving temporal inter/intra-module input/output computing overlapping 

30 over several pre-defined connection means types. Connection means types hereafter called Pipeline, 
Scatter/Gather, Branch-Connections are defined. Among the Connections above-cited, Data 
Connections is the most important class. All the Connections belonging to the dass of Data 
Connections insure the repeated transfers of successive Image Strips together with the necessary 
synchronization information, including Hme-Ref. They are all mono-directional. Referring to FIG.1A 

35 to FIG.1D, Data Connections comprise: 

Simple Connection labeled PP that is a point to point Connection, which can transfer 
consecutive Image Strips (FIG.1A). 

[l/n]-Scatter Connection labeled SC that is a point to point Connection belonging to a 
group of n Connections all issued from a common Output Port, which can transfer Image Strips one 
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after the other at the rate of one Image Strip every n Image Strips. The corresponding Emission Port 
is called Data Scattering Port (RG.1B). 

[l/n]-Gather Connection labeled GA that is a point to point Connection belonging to a 
group of n connections all reaching a common Input Port, which can transfer Image Strips one after 
5 the other at the rate of one Image Strip every n Image Strips and gather all the Image Strips reaching 
this Output Port. The corresponding reception Port is called Data Gathering Port (FIG. IB). 

Branch-Connection (FIG.1C) labeled BR that is a point to point Connection similar to a 
simple Connection, but linking two Modules belonging to the same branch of a Scatter-Gather 
Structure. 

10 Examples of Communication Patterns using said Image Strips with minimum latency are 

described hereafter. They rely on methods of Task Partitioning and methods of Data Partitioning. 

Referring to FIG.1A, a Pipeline Structure comprises a Source Module SOURCE, one or 
several ordinary module(s), for example MODI, MOD2 and a Sink Module SINK. The Simple 
Connections PP are adapted to the implementation of said Pipeline Structure, which can perform 

15 algorithm serial combination. The Image Processing Functions are applied one after the other along 
the physical data path linking the involved Modules, forming said Task Partitioning Structure. Each 
Module activates a given task for all the Image Strips, and several intermediate results can be passed 
to the next Module as parallel data streams. FIG.1A represents a pure Task Partitioning 
Structure. A Short-cut between the Source and the Sink Modules, or between the Source and an 

20 Ordinary Module, is permitted as illustrated by the connection represented by a doted line. The 

latency introduced by Task Partitioning by Pipelining increases with the number of pipeline levels. In 
fact, several Image Strip time-periods are lost each time one goes through a pipeline stage, but in 
practice, Task Partitioning by Pipelining is very efficient. 

Referring to FIG-IB, a Scatter/Gather Structure comprises one Source Module 

25 SOURCE, at least two Ordinary Modules disposed in parallel in branches such as MODI, 

MOD2,...MODn, and a Sink Module SINK. The Scatter Connections SC and the Gather Connections 
GA are specialized in Scatter/Gather type of Task Partitioning. For instance, the Image delivered by 
the Port of the Source Module is sliced into Image Strips that are numbered according to an index. A 
[l/n]-Scatter Connection distributes evenly the Image Strips on the Modules MODI to MODn, 

30 according to their Strip-Index. Once processed, the scattered Image Strips are automatically put 
together again thanks to a [l/n]-Gather Connection. Instead of n branches, only two branches could 
be used to form a Scatter/Gather Structure. In this case, the Image Strips having the odd index are 
processed on one branch and the Image Strips having the even index are processed on the other 
branch. FIG.1B represents a pure Data Partitioning Method. A Short-cut between the Source 

35 and the Sink Modules is permitted. The Scatter/Gather operation allows to get the advantages of Data 
Partitioning while keeping a low latency. However, it might entail critical difficulties when applied to 
algorithms featuring wide spatial breadth since, by nature, each branch of the scattering structure 
processes non-contiguous Image Strips. It is mostly advantageous if the required spatial overlaps are 
limited. 




Referring to FIG.1C, a Branch Mode Structure comprises a Scatter /Gather Structure 
having Pipeline Structures in the branches. The Scatter/Gather Structure comprises one Source 
Module SOURCE, at least two parallel branches comprising Pipeline Structures of Ordinary Modules, 
and a Sink Module SINK. For instance, the Image delivered by the Port of the Source Module is sliced 
5 in n=2 Image Strips. The Image Strips are numbered according to an index, in this example, the 
indices may 1, 2. A [l/2]-Scatter Connection SC distributes the Image Strips on MODI, MOD2, of 
the first branch BR, following the parity of their Strip-Index. The Image Strips are alternately 
processed by the Modules MOD3, MOD4 of the second branch BR. Once processed, the scattered 
Image Strips are automatically put together again thanks to a [l/2]-Gather Connection GA. A Short- 

10 cut as previously described is permitted. 

Referring to FIG. ID, further structures, called Wide-Band data Partitioning Structures, 
may be designed as composition of the preceding Structures, using simple connections PP. The Wide- 
Band data Partitioning Structures use a characteristic of the Source Module, namely its ability to 
convey two synchronous output streams. If, within the Module SOURCE, a delay of half an image is 

15 introduced, plus a possible provision for spatially overlapping Image Strips, then It becomes feasible to 
produce two synchronous streams as if they were emanating from two distinct half-size images. The 
downstream Modules process these streams independently down to the Module SINK that will feature 
two distinct Input Ports, one for the upper part and one for the lower part of the Image. There 
remains now to gather the two half images within the Sink Image Processing function and push the 

20 final result towards the targeted Terminal Point. More generally speaking, this arrangement can be 
applied to n consecutive, possibly overlapping, Image Wide-Bands, which are made out of several 
consecutive Image Strips. The above-described Task Partitioning by Pipelining can be seamlessly 
combined with Wide-Band Data Partitioning as illustrated by FIG.1D. 

Wide-Band Data partitioning has advantage over Task Partitioning by Pipelining, since it 

25 reduces the required number of bandwidths and allows high modularity, so that tasks do not have to 
undergo unnatural slicing in order to reach real-time conditions through pipelining. However, Wide- 
Band Data Partitioning introduces higher latencies. Task Partitioning by Pipelining remains more 
efficient than Wide-Band Data Partitioning. 

lit) Defining the Transmission of the Image Strips 

30 Time-Ref locates Image Strips with respect to the current Image Index in the sequence and 

with respect to the Image Strip position within the current image. As above-described, several distinct 
Image Strips may be transferred along with Time-Ref structures containing the same information, 
such as when a given input Image Strip produces several output Image Strips. Said image strips are 
transmitted one after the other over the Connections. So, all the data communication and 

35 computation occur at the Strip level, which permits of keeping latency very small. Each Module 
repeatedly receives, processes and transmits the Image Strips. Time-stamped image strips are 
used in such a way that inter-module and intra-module temporal overlapping is performed between 
image strip transmission and that computing is constantly achieved: so, the overall functioning is 



extremely optimized. For each Connection means type, the precise way those overlapping properties 
are achieved will be explained in details. It relies on a proper input/output image strip delay policy. 

FIG-2 illustrates the transmission of the Image Strips in the Pipeline Structure of FIG.1A 
according to a technique called Pipelining without Overlap. The SOURCE is defined to produce a 
5 predetermined number of adjacent Image Strips (with active areas, without overlapping areas). The 
image strips have a given number of lines, called width w. These Image Strips, produced by the 
SOURCE, are transmitted to MODI, one after the other, and from MODI to MOD2. For instance, 
MODI applies on each Image Strip a first image processing function IP1, and MOD2 further applies on 
the transmitted Image Strips a second image processing function IP2. The references t-4, t-3, t-2, t-1, 

10 t, are the instants when the successive Image Strips are processed in MODI. While the first image 
processing function IP1 is producing an Image Strip at the instant t, denoted by [t]-Image Strip, the 
Image Strip produced at the previous instant t-1, denoted by [t-l]-Image Strip, is already available to 
be transmitted to MOD2. And the preceding Image Strip that has been produced at the instant t-2, 
denoted by [t-2]-Image Strip, is already available in MOD2 to be processed by the IP2 image 

15 processing function. Thus, in a Pipeline Structure, during the production of a given [t]-Image Strip by 
a first Module at the instant t, the first Module Output Port transmits the last in time [t-l]-Image Strip. 
And, during the processing of a certain [t-2]-Image Strip by a second Module, said second Module 
Input Port receives the next in time [t-l]-Image Strip. So, each Module is constantly working. 

FIG.3A illustrates the transmission of Image Strips in the Pipeline Structure of FIG.1A 

20 according to a technique called Pipelining with Overlap. The SOURCE is defined to produce a 
predetermined number of adjacent Image Strips (with active areas, without overlapping areas). The 
image strips have a given number of lines, called width w. In an example, these Image Strips, 
produced by a SOURCE are transmitted to the first Module MODI, one after the other, to be 
processed by a first processing function IP1. Then they are transmitted in the same order to the 

25 second module MOD2, in order to be processed by a second processing function IP2 that needs a 
neighborhood of the pixels for processing them. Thus, two overlapping areas of predetermined width 
a, located each side of the active area, are needed in MOD2 for carrying out said second image 
processing function IP2. The references t-4, t-3, t-2, t-1, t, are the successive instants when the 
adjacent Image Strips of width w are produced by the first Module MODI, 

30 While the image processing function IP1 produces a [t]-Image Strip at the instant t in 

MODI, it has already produced the [t-l]-Image Strip. The next Module MOD2 needs an active area 
plus two overlapping areas. Thus, the [t-l]-Image Strip cannot be transmitted because the 
overlapping area located in the [t]-Image Strip is not ready. However, the [t-2]-Image Strip produced 
at t-2 is already available. So, at the instant t, the Output Port of MODI sends the [t-2]-Image Strip 

35 plus the overlapping area that is located in the [t-l]-Image Strip. However, it needs to send neither 
the overlapping area located in the [t-3]-Image Strip nor an area of width a, located in the [t-2]- 
Image Strip adjacent to the [t-3]-Image Strip, because these areas are already available in the second 
Module MOD2. During the production of the [t]-Image Strip, the Image Strip that is transmitted to the 
second Module MOD2 has the particularity to have a width w and to be shifted of the width a with 




respect to the [t-2]-Image Strip toward the [M]-Image Strip. During that transmitting operation, the 
Module MOD2 processes the [t-3] Image Strip with its two overlapping areas (one overlapping area 
located each side), which are already available in MOD2. So, in the mechanism of Task Partitioning by 
Pipelining with Overlap, an extra-delay is needed between the production of a given Image Strip and 
5 the transmission of a previously produced Image Strip. This extra-delay is related to the production of 
the necessary overlapping area that is located in the last in time Image Strip. Using such extra-delay 
and spatial shift yields an optimal scheme of Image Strips transmission. 

FIG.3B illustrates the transmission of Image Strips in the Scattering Structure of FIG. IB 
according to a technique called Scattering with Overlap. The SOURCE is defined to produce a 

10 predetermined number of adjacent Image Strips having a given number of lines, called width w. In 
the example illustrated by FIG.3B, one Image Strip out of two is transmitted to the first Module 
MODI. Said Image Strips are transmitted one after the other in order to be processed by a first 
processing function IP1. The Image Strips that are not transmitted to the first Module MODI are 
transmitted to the second module MOD2 disposed in parallel, in order to be processed by a second 

15 processing function IP2. Thus, Modules MODI and MOD2 process alternate Image Strips. That may be 
the case when the number of Modules of FIG. IB is n=2. Between the Image Strips processed by 
MODI, there is a void of lines, and between the Image Strips processed by MOD2, there is a void of 
lines. For instance, MODI processes the Image Strips represented by white bands and MOD2 
processes the Image Strips represented by hatched bands. For instance, MODI processes the Image 

20 Strips with odd indices, and MOD2 processes the Image Strips with even indices. The Image 
processing functions of MODI and MOD2 are assumed to need a neighborhood of the pixels for 
processing them. Thus, two overlapping areas of predetermined width a, located each side of the 
active area, are needed in order to carry out said image processing functions IP1, IP2. The references 
t-4, t-3, t-2, t-1, t, are the successive instants when the Image Strips of width w are produced by the 

25 SOURCErWhile the SOURCE is producing a [t]-Image Strip with an even index at the instant t> it has 
already produced the previous Image Strips at t-4, t-3, t-2, t-1,.... The next Module MOD2 needs an 
active area having an even index plus two overlapping areas that are located each side in the Image 
Strips having odd indices. Thus, the [t-2]-Image Strip produced at t-2 in the SOURCE, which has an 
even index and which is already available, is transmitted to MOD2 together with two overlapping 

30 areas of width a located in the already produced [t-3]- and [M]-Image Strips of odd index . So, at 
the instant t, the Output Port of SOURCE sends the [t-2]-Image Strip plus said overlapping areas 
located at its sides. During that transmitting operation, the Module MOD2 processes the [t-4] Image 
Strip, which has an even index, and its two overlapping areas coming respectively from the odd index 
[t-5]-Image Strip and the odd index [t-3]-Image Strip (one overlapping area located each side), which 

35 are already available in MOD2. During the transmission of the even index [t-2]-Image Strip, the 
SOURCE Output Port cannot transmit the odd index [M]-Image Strip to MODI because the 
overlapping area located in the [t]-Image Strip is not yet ready. So, MODI processes the odd index [t- 
3]-Image Strip that has been transmitted before the transmission of the even index [t-2]-Image Strip. 
It has to be noted that the overlapping areas are to be transmitted together with the active area 
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because they are located in Image Strips of indices that are not otherwise processed by the Modules 
MODI or MOD2. For instance, MOD2 has no lines of the odd indices Image Strips save the lines of the 
overlapping areas. MODI has no lines of the even indices Image Strips save the lines of the 
overlapping areas. As above-explained, there are voids between the Image Strips in MODI, MOD2. 

5 FIG.3C illustrates the transmission of Image Strips in the Gathering Structure of FIG. IB 

according to a technique called Gathering with Overlap. The SINK is defined to construct a final 
image from the gathering of a predetermined number of adjacent Image Strips, for instance, the 
Image Strips processed according to the Scattering with Overlap previously described. The Image 
Strips have a given number of lines, called width w. In the example illustrated by RG.3C, these 

10 Image Strips are gathered by the Input Port of the SINK. The SINK does not need the Overlapping 
areas processed by IP1 and IP2. The SINK only needs the adjacent active areas for constructing the 
final Image. So, the Image Strips without their overlapping areas are alternately sent by MODI and 
MOD2 in the order of the successive instants t-4, t-3, t-2, t-1, t. So, MODI sends the odd index [t-5]- 
Image Strip, then MOD2 sends the even index [t-4]-Image Strip, then MODI sends the odd index [t- 

15 3]-Image Strip, then MOD2 sends the even index [t-2]-Image Strip, then MODI sends the odd index 
[t-l]-Image Strip, then MOD2 sends the even index [t]-Image Strip. And SINK constructs the final 
Image with only the active areas, since, when SINK receive the lines of the [t]-Image Strip, for 
instance, it already disposes of the lines of the adjacent [t-l]-Image Strip, etc. 

FIG.3D illustrates the transmission of Image Strips in the Branch-Structure of FIG.1C 

20 according to a technique called Branch-Connection with Overlap. Regarding the branch formed of 
Modules MODI and MOD3, while the SOURCE is producing a [t]-Image Strip with an even index at the 
instant t, it has already produced the previous Image Strips at t-4, t-3, t-2, t-1,.... It is assumed that 
the next Module MODI needs an active area having an even index plus two overlapping areas that are 
located each side in the Image Strips having odd indices. Thus, the [t-2]-Image Strip produced at t-2 

25 in the SOURCE, which has an even index and which is already available, is transmitted to MODI 

together with two overlapping areas of width a located in the already produced [t-3]- and [t-l]-Image 
Strips of odd index. So, at the instant t, the Output Port of SOURCE sends the [t-2]-Image Strip plus 
said overlapping areas located at its sides. During that transmitting operation, the Module MODI 
processes the [t-4] Image Strip, which has an even index, and its two overlapping areas coming 

30 respectively from the odd index [t-5]-Image Strip and the odd index [t-3]-Image Strip (one 

overlapping area located each side), which are already available in MOD2. During the transmission of 
the even index [t-2]-Image Strip, the SOURCE Output Port cannot transmit the odd index [t-l]-Image 
Strip to MODI because the overlapping area located in the [t]-Image Strip is not yet ready. So, MODI 
processes the odd index [t-3]-Image Strip that has been transmitted before the transmission of the 

35 even index [t-2]-Image Strip. It has to be noted that the overlapping areas are to be transmitted 
together with the active area because they are located in Image Strips of indices that are not 
otherwise processed by the Modules MODI or MOD2, Now, assuming that the next module MOD3 of 
the branch also needs overlapping areas, then the overlapping areas must be cumulated along the 
branch in order to be used in the chain of Modules within the branch. 



iiPiPili 




{to *< 



In the Wide-Band Structure of FIG.1D, the Image Strips are transmitted according to a 
technique called Wide-Band with Overlap. The SOURCE is defined to produce two Image Halves, 
each Image Half being formed of Image Strips. The first Image Half is transmitted to the first branch 
comprising Modules MODI and MOD3. The second Image Half is transmitted to the second branch 

5 comprising the Modules MOD2 and MOD4 disposed in parallel. In each branch, the Images Strips are 
processed like in Pipelining Structures. The transmission is performed either according to the 
Pipelining technique illustrated by FIG.2 or, when neighborhoods are needed, according to the 
Pipelining with Overlapping technique illustrated by FIG.3A. 

In all the transmission techniques including overlaps, the delay for producing the Image 

10 Strips, the delays for emitting the Image Strips and the delays for processing the received Image 

Strips are different. So, the techniques comprise steps of adjusting the difference between the instant 
of production of the Image Strips by a Module and the instant of emission of the Image Strips by said 
Module. The techniques also comprise steps of adjusting the difference between the instant of 
reception of the Image Strips by a Module and the instant of processing of the Image Strips by said 

15 Module. These technique further comprises steps of fine adjustment related to the Overlapping areas. 

IV) Running the Image Transport Engine 

This Image Transport Engine governs the exact behavior of the system at run time. The 
Image Transport Engine manages the input and output data transfers linking the software Modules 
to the external world, it manages parameters reading, IP function calling and data locating. This last 
20 functionality corresponds to the necessity to locate image strips with respect to space and time so as 
for instance to provide the necessary information for the registering of delay data streams. The 
Image Transport Engine also conditions the possible temporal and / or spatial image strip 
overlapping, the possible data scattering or gathering operations. 

V) Apparatus having computing means for using the Software System 
25 and program product to form the Software System 

Referring to FIG.4, a medical examination apparatus 150 comprises means for acquiring 
digital image data of a sequence of images, and a digital processing system 120 for processing these 
data using the Software System described above. The medical examination apparatus comprises 
means for providing image data to the processing system 120 which has at least one output 106 to 
30 provide image data to display and/or storage means 130, 140. The display and storage means may 
respectively be the screen 140 and the memory of a workstation 110. Said storage means may be 
alternately external storage means. This image processing system 120 may be a suitably programmed 
computer of the workstation 130, or a special purpose processor. The workstation 130 may also 
comprise a keyboard 131 and a mouse 132. 



Claims 

1. Software System, referred to as Image Transport Engine, for processing a sequence of 
images by deploying Image Processing Functions onto a multiprocessor system called Platform, said 
Platform generating input image data in order to provide processed output image data, said Software 

5 System comprising: 

a software data partitioning model, referred to as Communication Pattern, which partitions 
the images of the sequence using time-stamped data packets, the transfer of which may overlap the 
execution of said image processing functions. 

2. The Software System of Claim 1, wherein the Communication Pattern is formed of nodes 
10 linked by arcs; the nodes are Software Modules; the arcs are oriented Connections associated to the 

Modules through Ports; and each Module activates one Image Processing Function attached to it and 
manages data transfers and synchronization. 

3. The Software system of Claim 2, wherein: 

a Module exchanges information with an other Module through Ports; 

15 among the Modules, there are one Source Module responsible to generate the time-stamped 

data packets and a time reference data structure labeled Time-Ref, which locates every image data 
packets of a given Image Sequence; one or several Sink Modules used as Output Data receptors; and 
Ordinary Modules connected between the Source Module and the Sink Modules in such a manner that 
the image data flows in one direction only and in an a-cydic manner; 

20 the Source Module has no Input Port and the Sink Modules have no Output Ports; the 

Ordinary Modules have Input and Output Ports. 

4. The Software system of Claim 3, wherein, among the Connections, there are data 
Connections dealing with Data and specialized in the transfer of image data packets, which are one- 
way Connections. 

25 5. The Software system of Claim 4, wherein the time reference data structure labeled Time-Ref 

locates data packets with respect to an image index in the image sequence and with respect to a data 
packet position within the current image. 

6. The Software system of Claim 5, wherein the source Module partitions the Input Data into 
data packets that are data slices referred to as Image Strips, an Image Strip being a packet of 

30 consecutive image lines, parallel to the image lines, the data arriving along said lines formed of pixels 
that have to be processed; and Image Strips may overlap other Image Strips. 

7. The Software system of Claim 7, comprising the definition of Overlapping Areas for the 
active area of the Image Strips, which are formed of extra parts of Image Strips located on either 
sides of said active area of the Image Strips. 

35 8. The Software system of Claim 7 for programming a distributed application comprising steps 

of transmitting Image Strips with Overlapping areas between emitting Modules and receiving Modules, 
wherein steps of adjusting the difference between the instant of production of Image Strips by a 
Module and the instant of emission of the Image Strips by said Module, and steps of adjusting the 
difference between the instant of reception of Image Strips by a Module and the instant of processing 



o 



of the Image Strips by said Module, for performing optimal overlapping between data transfer an data 
processing. 

9. The Software system of one of Claims 6 to 8, wherein the time reference structure labeled 
Tj me -R e f locates the Image Strips with respect to the current image index in the sequence and with 

5 respect to the Image Strip position within the current image; and the Data Connections insure 
repeated transfers of successive Image Strips together with synchronization information including 
Time-Ref and all Modules repeatedly receive, process and transmit the Image Strips. 

10. The Software System of one of Claims 2 to 9, wherein the Communication Pattern comprises 
one the following types of Connections between two Ports: 

10 A Pipe-Line Connection that is a point to point Connection, which transfers consecutive 

Image Strips; 

A [l/n]-Scatter Connection that is a point to point Connection belonging to a group of n 
Connections all issued from a common Output Port, which transfers one Image Strip every n Image 
Strips; 

15 a [l/n]-Gather Connection that is a point to point Connection belonging to a group of n 

connections all reaching a common Input Port, which transfers one Image Strip one every n Image 
Strip and gathers all the Image Strips reaching this common Output Port. 

11. The Software system of Claim 10, comprising method of task partitioning and/or method of 
data partitioning among which Task Partitioning Structures using Pipeline Connections, wherein the 

20 Image Processing Functions are applied one after the other along the physical data path linking the 
involved Modules, each Module activating a given task for all the Image Strips. 

12. The Software system of Claim 10, comprising method of task partitioning and/or method of 
data partitioning among which Scatter/Gather type of Data Partitioning using a [l/n]-Scatter 
Connection that distributes the Image Strips in n destination Modules, according to Image Strip- 

25 Indices with possible spatial shifts between Images Strips and time delay adjustments, and/or using a 
[l/n]-Gather Connection that gathers n Image Strips in a destination Module according to their Image 
Strip indices. 

13. The Software system of Claim 10, comprising Data Partitioning Structures using Pipeline 
Connections and a propriety of the Source Module that is to convey two synchronous output streams 

30 as if they were emanating from two distinct parts of images, and gathering said two parts of images 
within the Sink Image Processing function and push the final result towards a targeted Terminal Port 

14. A medical examination imaging apparatus having means for acquiring medical digital image 
data and using a Software System having access to said medical digital image data according to one 
of the preceding Claims 1 to 13, and having display means for displaying the medical digital images 

35 and the processed medical digital images. 

15. A computer program product comprising a set of instructions for running the Software 
System as claimed in one of Claims 1 to 13. 




Abstract 

Software System, referred to as Image Transport Engine, for processing a sequence of 
images by deploying Image Processing Functions onto a multiprocessor system called Platform, which 
generates input image data in order to provide processed output image data. The Software System 
5 comprises a software data partitioning model, referred to as Communication Pattern, which partitions 
the images of the sequence using time-stamped data packets, the transfer of which may overlap the 
execution of said image processing functions. The Communication Pattern is formed of Software 
Modules linked by oriented Connections associated to the Modules through Ports. Each Module 
activates one Image Processing Function attached to it and manages data transfers and 
10 synchronization. The source Module partitions the Input Data into data packets that are Image Strips, 
formed of consecutive image lines. The Image Strips may overlap other Image Strips. Overlapping 
Areas formed of extra parts of Image Strips located on either sides of said Image Strips can be 
processed together with said Image Strips. 



Application: Medical Image Processing 
FIG.1C 
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